Writing is a time-consuming process; writing high-quality publications requires attention to detail at every step of the way, from the actual prose on paper to its layout in the document to the presentation of figures. In this guide we walk you through 10 aspects of writing a scientific article using LaTeX to format your work. We emphasize typing commands at the unix command line in this guide as a way for you to peek under the hood of the LaTeX engine. This will allow you (the author!) power over the production of your own academic documents.

This guide could be extremely long. There are many, many fantastic resources on typesetting. Here we have hand-selected 10 topics to help lower the barrier to a more efficient and higher quality paper writing workflow. Specifically we focus on

  1. The structure of a document
  2. What are all of these tools? tex, latex, pdflatex, xelatex, lualatex, etc
  3. LaTeX happy workflows for your paper
  4. git and version control
  5. Journal style
  6. On writing, an interlude
  7. Dos and Don’ts
  8. On bibliographies
  9. On figures
  10. Handy tools

We provide a pile links to graphical interfaces to LaTeX at the end of the document, however emphasizing that value of

To help people practice these commands we have hands-on examples ready in a JupyterLab session, through Binder. Here you can follow along, processing documents in a terminal session. You can start this environment here.

To use LaTeX on your own computer, you will need to install it (we highly recommend following the links therein to TeX Live on each system).

1. Structure and Markup

A LaTeX document (or a .tex file) is a plain text document that contains commands that guides the LaTeX processing program how to create a beautiful pdf. These commands can be “markup” like \textbf{this is bold} for bold text or $\alpha + \beta \frac{1}{x^2}$ for math like \(\alpha + \beta \frac{1}{x^2}\) or commands that tell LaTeX about document structure like \section{Introduction} or even commands to identify a bibliography like \bibliography{refs_example.bib}.

Once you have a plain text document with markup, you then process it using a set of programs to create a publishable output like a .pdf file. This figure shows an example of a LaTeX document and highlights different parts of the document and their role.

The Structure of a LaTeX document

After processing that document (via, say, the command latexmk -pdflatex example.tex, assuming that the document is called example.tex), one can see a pdf file like the following image:

The associated pdf document

Takeaways

  • A LaTeX document is comprised of mainly two sections: preamble, which defines the styling, and the document text.
  • The LaTeX system separates concerns by allowing you to focus writing content in a plain text document, following by processing.
  • Processing the typed document delivers a layout that automatically handles spacing (as we will see later).

Practice

What does example.tex look like when compiled to a pdf document? Can you add a title or author? Can you make some text bold?1 You can practice by following these steps (and similar ones) later sections:

  1. select the directory 1_structure in the JupyterLab window that launches when you clink on launch binder from the readme.md file in the associated github repository
  2. Then clicking on the Terminal icon in the JupyterLab pane
  3. Once you are there, try typing latexmk -pdflatex example.tex and then looking at the pdf.

You can also copy the github repository to your own local machine and launch the Terminal to see a Unix command prompt if you are using a Mac or Linux machine. Windows machine also offer a unix command prompt, but it is a bit more involved to install it.

2. Flavors and Programs: tex, latex, pdflatex, etc

Although the most basic program that parses markup is called latex, in current daily use, you will mostly find yourself using pdflatex or even xelatex or maybe lualatex.

When Donald Knuth created this approach to making beautiful scientific documents, he started with the tex program but Leslie Lamport built latex by combining multiple tex commands into fewer and simpler macros. Both originally created documents in dvi or postscript format. Nowadays, pdf files are the best way to make a document that looks the same to all who want to view it on their screens or print it for themselves.

Here is a list of the common programs that one might use to create a pdf file from a latex document:

For example, at the command prompt in the Terminal, you might type pdflatex example.tex create an example.pdf file (if you only do it once, the citation will show up as a ? and no bibliography will be printed).

Notice also:

The following figure shows how it may require three runs of pdflatex (plus a run of bibtex) to go from an example.tex file to an example.pdf file:

From LaTeX to PDF commands

You can replace those multiple lines with a single call to latexmk -pdflatex example.tex.

Takeaways

  • Always use LaTeX markup: very rarely (if ever) should you need to dip into plain TeX
  • Always use PDF output (pdflatex) and PDF figures (or PNG … more on this later) rather than DVI or PS format for sharing generated documents

Practice

See the directory 2_texflavors and the readme.md file therein. Can you change the font and use xelatex to make a pdf, say, trying latexmk -xelatex example.tex?

3. LaTeX workflows

A given scientific paper will require many files and often involves many authors. For example, several .tex files (for different sections), multiple figures (in the form of .pdfs), and bibliographis (in .bib files) may all be part of the paper. Organizing these files is a consistent fashion will lead to a clear process when dealing with revisions at a later date.

As a specific example a main.tex file might look like this:

\documentclass{article}
\title{My Title}
\begin{document}
\maketitle
\input{abstract}
\input{intro}
\input{results}
...
\bibliography{mybib.bib}
\end{document}

But results.tex might look like this:

\section{Results}

Figure~\ref{fig:vaccine_by_pop} shows that opposition to vaccination peaks at a population of 100,000.

\begin{center}
\begin{figure}[!ht]
\includegraphics[width=.8\textwidth]{vaccine_by_pop.pdf}
\caption{Vaccination opposition by population}\label{fig:vaccine_by_pop}
\end{figure}
\end{center}

The number 100,000 and the figure vaccine_by_pop.pdf are derived from the R file called vaccine_by_pop.R. This R file relies on data that is cleaned by vaccine_data_cleaning.py, in addition to data that are downloaded, cleaned, and merged from the web.

So how do we organize the data, the files, and the overall workflow? There are many possibilities, but we’re reminded by a slice of the Zen of Python:

Simple is better than complex. Complex is better than complicated. Flat is better than nested.

We provide two specific examples of workflows below, first noting two aspects that will greatly improve your process. The first is to separate your data from your processing and presentation:

The second aspect, directly related to the LaTeX, is to establish a predictable naming convention. For example, each output like a table or figure uses one script with the same name:temp_vs_time.pdf <—> temp_vs_time.py and that LaTeX labelling follow this convention \label{fig:temp_vs_time}. When editing the document, the path from figure to the associated plotting script and related data is then clear.

On Directory Structure

Here are a two examples of directory structures have have worked for us:

In this example, we use Matt West’s directory structure, where the versions of the paper are kept in their own directories:

paper_topic_name_dir_name              | string used for repo, tex, and bib files
+ requirements.txt                     | number of pages,  etc
+ 1_submitted_paper
|   +-- paper_topic_name.tex
|   +-- refs_topic_name.bib
|   +-- journal_class.cls              | any files needed for the journal latex style
|   +-- figures
|   |   +-- temp_vs_time.pdf           | descriptive names for figures (not fig1.pdf, etc)
|   |   +-- error_vs_stepsize.pdf
|   |   `-- ...
|   +-- data                           | data files that generate the figures
|   |   +-- Makefile                   | Makefile that will re-generate all figures
|   |   +-- temp_vs_time.csv           | use the same name as the resulting figure
|   |   +-- plot_temp_vs_time.py       | plotting scripts, use names like plot_.py
|   |   `-- ...
|   `-- submitted_paper_topic_name.pdf | actual PDF file submitted
+ 2_reviews
|   +-- review_1.pdf                   | individual reviews
|   +-- review_2.pdf
|   `-- editor_statement.pdf           | instructions and summary from editor
+ 3_response_to_reviews
|   +-- response_topic_name.tex
|   `-- sent_response_topic_name.pdf   | actual PDF file sent to editor
` 4_revised_paper
    +-- paper_topic_name_revised.tex
    +-- refs_topic_name_revised.bib
    +-- journal_class.cls              | copy here any other files needed
    +-- figures                        | copy here all the figures again
    |   +-- temp_vs_time.pdf           | edit figures as needed
    |   +-- error_vs_stepsize.pdf
    |   `-- ...
    +-- data                           | copy all data again and edit as needed
    |   `-- ...
    `-- submitted_paper_topic_name_revised.pdf | actual PDF submitted

Reference: Matt West @ https://lagrange.mechse.illinois.edu/latex_quick_ref/

An alternative approach uses git branches for different versions, and a single Makefile for all tasks (from turning the paper into a pdf file via LaTeX, to creating figures, etc.). See also the discussion in Bowers and Voors (2016), section 3.

paper_topic_name_dir_name              | string used for repo, tex, and bib files
+ Makefile                             | file that tracks file relationships
+-- Data                               | directory for data and data cleaning, merging work
    + README.md                        | file with instructions and explanations
    + merge_data.R                     |
    + orig_data.csv                    | original data set, not to be changed
    + merge_data.csv                   |
    `-- ...                            |
+-- Analysis                           |
    + README.md                        |
    + linear_simulations.R             | file that runs simulations and saves output
    + linear_simulations.rda           | output from linear_simulations.R
    `-- ...                            |
+-- Figures                            |
    + README.md                        |
    + linear_simulations_N100.R        | file creating a figure
    + linear_simulations_N100.pdf      | the figure from linear_simulations_N100.R
    + descriptives.R                   | file creating a table
    + descriptives.tex                 | the table in LaTeX format
    `-- ...                            |
+-- Paper                              |
    + README.md                        |
    + main.tex                         | the main LaTeX file
    + abstract.tex                     | the abstract file
    `-- ...                            |
+-- References                         |
    + big.bib                          | bibliography file
    `-- ...                            |

Takeaways

  • Separate data from processing from presentation.
  • Create consistent labels in the LaTeX to script that generates it to the associated data.
  • Commit to a workflow! Anything is better than nothing and from the Zen of Python Now is better than never.

Practice

See the directory 3_workflows and the readme.md file therein.

4. On collaboration

Here we discuss our process for writing papers with others (and ourselves as we revise papers, return to old papers, respond to re-analyses, etc.. in the future)

Collaborating asychronously: git version control

We use git version control via the github interface as a part of our collaboration (with others, or versions of ourselves responding to reviewers, making revisions, etc.)

We do not describe it in-depth here. Given some knowledge about git, we advise the following practices:

What to track:

  • Your .tex file
  • The .bib file (either a global one used across your projects or one created for your specific project)
  • Figures -> ./figures/*.pdf
  • Scripts for the figures -> ./data/*.py , ./data/*.R
  • Data for the figures -> ./data/*.csv

What not to track:

  • Any data with personally identifying information or information that cannot ethically be made public.
  • The pdf of the paper -> paper_randnoise.pdf
  • Any typesetting output -> *.log, *.bbl, *.aux, etc

Tips for using git to track versions and changes among co-authors in writing a LaTeX paper:

  • Agree with your co-authors about how you will organize the text in your documents. We have used on one of
    • one sentence per line
    • hard wrapping at say 80 characters
    • nothing, free for all (plus or minus pre-commit automatic reformatting or checking of files).
  • Commit often
  • For large edits, take sections at a time, to reduce merge conflicts
  • Before you commit and push the repository for collaborators you need to do a clean recompile at the command line.

Collaborating synchronously

Fewer tools allow collaborators to edit plain text documents at the same time. We nearly always rely on asychronous collaboration, even if we have broken up a task and the whole team is working on it at the same time, even in the same room.

Overleaf is designed for this task. It compiles LaTeX and syncs with github. See also the online versions of LaTeX listed here.

There are other systems for editing plain text at the same time such as Teletype for Atom.

Takeaways

  • Writing a scientific paper is an act of collaboration — with a team at the same time, with yourself in the future, perhaps with those who will download your data and code and try to learn from it after it is published. So, you have to think about and talk about collaboration when you set out to write a paper.
  • Use the git system for version control whether or not you also work synchronously.
  • TODO

Practice

See the directory 4_git and the readme.md file therein.

On Writing: An interlude…

You already know Hemingway’s famous quote: “the only kind of writing is re-writing”. However, you might not know about linters.

On Linters

A linter is a program that analyzes the text that you write as you write it. When your mis-spelled words are highlighted in your email client, you are seeing the results of a linter alerting you to improve your text. Linters are also used in programming — catching code errors before you need to run the code, for example, by alerting you to unmatched parentheses or missing semi-colons or the like.

Other linters can look for style problems. Consider the following terrible sentence:

More research is needed to fill the gap created in extant literature in order to impact policy with very important findings.

One of our linters, the write-good linter, alerted us to some problems:

col 16 error| [write-good] "is needed" may be passive voice [E]
col 71 error| [write-good] "in order to" is wordy or unneeded [E]
col 102 error| [write-good] "very" is a weasel word and can weaken meaning [E]

Of course, linters cannot do it all. We like them because they draw attention to sentences which may need work. We bet that most readers of this guide would be able to re-write that sentence without using the passive voice, without using “impact” as a verb (yuck!), and with a stronger justification for research than to just fill a gap in the literature.

Tips and Tricks

Some tips that work for us:

  • Read paragraphs and sentences out loud to “edit by ear” Becker (1986).
  • Step away from your writing for a day or more in order to come back to it with fresh eyes.
  • We recommend the following two guides to academic writing: Gopen and Swan (1990) and Becker (1986).
  • We routinely use alex, proselint, and write-good for our writing. The LanguageTool also looks promising.
  • Avoid constantly re-compiling your document to see how it looks unless you are working on difficult diagrams and mathematics. Your first task is writing not reading.
  • Mark open items and second pass items with %TODO. You can find all places where you have %TODO in your document using: grep TODO paper_randnoise.tex
  • Clear contributions, Outline, Write/Revise
  • Polish and make it look visually appealing to read at the end.

Takeaways

  • Peer reviewed publications are critically important to science —– treat your writing and presentation of results with care
  • Peer reviewed publications take reviewer/editor time -— treat your writing and presentation of results with care
  • (Hopefully) Many people will read your publication -— treat your publication with care

Practice

See the directory 6_linting and the readme.md file therein.

LaTeX dos and don’ts

Here are some facts about how to use LaTeX. Obviously not just opinions. :) (LUKE: Maybe you can add some examples etc. here?)

DO keep your LaTeX readable!

DON’T overuse macros

DO use packages for consistent layouts

DON’T use \begin{align} for everything, instead try specific environments built for your purpose.

DO use consistent fonts throughout (including within figures).

Takeaways

TODO

Practice

See the directory 7_dos and the readme.md file therein.

On citations and bibliographies

The LaTeX system allows you to (1) insert citations in your text using commands like \cite{ChOlSe_2021_lsrbm} which can turn into [7], (Chaudhry et al., 2021), [Ch21] or other citation styles within the text itself and also (2) to print out your bibliography, formatted according to your journal’s guidelines, using a single command in the LaTeX document like \bibliography{mybib.bib}. Separating formatting from information saves time: hundreds of citations will be printed automatically in the correct format if desired including only the sources you cited. If you decide that you no longer need a citation, this will be automatically removed from your bibliography. Journals often provide formatting guidelines in .bst files that can be referred to in the \bibliographystyle{} command.

The program bibtex (or biber) reads .aux files created by latex programs and creates a .bbl file which is then read by the latex program to format everything (above we showed the need to run latex, bibtex, latex, and latex in order for example, in order to make this work).

In order to use this system, you need a plain text file that is a database with entries formatted in BibTeX format. For example, here is one entry in the BibTeX file for this essay:

@article{ChOlSe_2021_lsrbm,
    author = {Chaudhry, Jehanzeb H. and Olson, Luke N. and Sentz, Peter},
    doi = {10.1137/20M1323552},
    journal = {SIAM Journal on Scientific Computing},
    number = {2},
    pages = {A1081-A1107},
    title = {A Least-Squares Finite Element Reduced Basis Method},
    url = {https://doi.org/10.1137/20M1323552},
    volume = {43},
    year = {2021}}

General workflow

The Structure of a LaTeX document with BIB

Takeaways

  • You will only need to add a BibTeX entry to your bibliography database (your .bib file) once. (And you can use tools like Zotero and BibDesk to make managing those collections of bibliographic information easier.)

Practice

See the directory 8_citations and the readme.md file therein.

On Figures and Tables and Math

Figures, tables, and math break up the text of a document and convey information that can make or break your attempts to persuade with your paper. We have a few suggestions about how to make these elements work with instead of against you here.

In general, if a figure or table has been created using code, your project should have a figure or table creation file like linear_simulations_N100.R which creates one figure in pdf format linear_simulations_N100.pdf. This figure creation file might require as input another file with simulation results, and in turn the simulation results creator file may need some data: we might notate this dependence among files like in our Makefile. For example in line 1 Data/clean_data.csv: Data/clean_data.R Data/raw_data.csv means that the file Data/clean_data.csv depends on Data/clean_data.R Data/raw_data.csv (is created by the .R file and the .csv file together). And line 2 is a command used to create Data/clean_data.csv (in this case, the command is R ---file Data/clean_data.R.

Data/clean_data.csv: Data/clean_data.R Data/raw_data.csv
    R ---file Data/clean_data.R

Analysis/linear_simulations.rda: Analysis/linear_simulations.R Data/clean_data.csv
    R --file Analysis/linear_simulations.R

Figures/linear_simulations_N100.pdf: Figures/linear_simulations_N100.R Analysis/linear_simulations.rda
    R --file Figures/linear_simulations_N100.R

In general Figures, Tables, and Math should appear close to where they are discussed in the text. Do not put them at the end of your document if you don’t want a grumpy reader: recall that most people are reading pdf documents on screens.

Figures

  • Tell LaTeX where to look for graphics using the \graphicspath{} command in the preamble. For example, we use \graphicspath{{.}{../Figures/}}.

  • We insert graphics into documents using the \includegraphics[]{} command. For example, if we wanted to include a figure but scale it to 1/3 of the width of the text (the area within the left and right margins), we would use: \includegraphics[width=0.3\textwidth]{myfig.pdf}.

  • Fonts in figures should match the fonts in the float/article. Note that using \includegraphics to scale a figure will also change the font sizes — be careful to ensure your figure text is easy to read.

  • You should attach a float environment after the paragraph of first reference. For example:

Figure~\ref{fig:vaccine_by_pop} shows that opposition to vaccination peaks at a population of 100,000.

\begin{center}
\begin{figure}[!ht]
\includegraphics[width=.8\textwidth]{vaccine_by_pop.pdf}
\caption{Vaccination opposition by population}\label{fig:vaccine_by_pop}
\end{figure}
\end{center}
  • Generally use \begin{figure}[!ht] or \begin{table}[!ht]

  • ! tex will ignore area restrictions

  • h place it “here” if it fits in the area

  • t place it at the “top” otherwise and if it fits otherwise create a new page

  • Don’t use \FloatBarrier and other tricks like \newpage, \vspace or \hspace for spacing

  • Use consistent color schemes in all figures throughout the paper.

  • Label everything. (TODO expand)

  • Do not introduce new notation in a figure or its caption. A reader should not have to hunt in the text to understand a figure.

  • The figure caption should describe, not discuss. A reader should not have to hunt in the text to understand a figure.

A terrible figure

Tables

  • If a table contains elements (like numbers) generated from code, then it should be generated entirely from code and saved in its own file.
    • In R, for example, we might use the xtable package to convert a matrix or data-frame to a LaTeX formatted table.
  • Tables should rarely have vertical lines, and in fact, tables should use as few lines as possible. (See this nice short guide on tables).

The figure caption should describe, not discuss. A reader should not have to hunt in the text to understand a figure.

Math

Math fonts should work with the main font of the article. For examples of good math and text font pairings see the LaTeX Font Catalogue.

Takeaways

TODO

Practice

See the directory 9_figures and the readme.md file therein.

Ways to type a document using LaTeX markup

A LaTeX document is a plain text file. This means that you can use any text editor to write a LaTeX document. However, a text editor that (1) recognizes that \textbf{} is a LaTeX command or that (2) keeps track of matching braces and parentheses makes it easier to write LaTeX markup. To that end, we use neovim (sometimes with the vimr gui) with vimtex plugins but we know that there are many other approaches to typing a plain text document using LaTeX markup.

Our friends who use LaTeX like the following systems. Each person prefers to interact with their computer differently, so we merely list what we’ve heard about here.

  • Emacs with Auctex
  • [Neovim or Vim with vimtex or texlab]
  • TexPad (we think this hides too many errors and warnings, so we think this is most useful so that you can go back to the command line)
  • TexShop (we think this hides too many errors and warnings, so we think this is most useful so that you can go back to the command line)
  • [TeXstudio]
  • [Atom]
  • [Sublime 3]

Journal Style (TEMPTED TO DROP THIS)

The journal will have a style file. For example, see: https://www.siam.org/publications/journals/about-siam-journals/information-for-authors#dnn_ctr2112_ContentPane

Following both of these will speed up the review and copy editing.

Practice

See the directory 5_style and the readme.md file therein.

Extra:

Helpful tools

Information about this document

We wrote this document using pandoc flavored markdown and turned it from plain text into HTML via the following command at the unix command line on our OS X laptops:

pandoc latex-guide.md --to html4 --from markdown+yaml_metadata_block+autolink_bare_uris+tex_math_single_backslash+inline_code_attributes --output latex-guide.html  --self-contained --variable bs3=TRUE --standalone --section-divs --template latex-guide-template.html   --include-in-header latex-guide-header.html --number-sections --table-of-contents --toc-depth=1  --variable theme=bootstrap --mathjax --variable 'mathjax-url:https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML' --citeproc

Alternatively, if you have access to R, you can do the following to turn this markdown document into HTML.

Rscript -e "library(rmarkdown); render('latex-guide.md')"

References

Becker, Howard S. 1986. Writing for Social Scientists : How to Start and Finish Your Thesis, Book, or Article. University Of Chicago Press.
Bowers, Jake, and Maarten Voors. 2016. “Six Steps to a Better Relationship with Your Future Self, v 2.0.” Revista de Ciencia Polı́tica 36 (3): 829–48. http://static.jakebowers.org/PAPERS/11-BOWERS-RCP-363.pdf.
Gopen, George D, and Judith A Swan. 1990. “The Science of Scientific Writing.” American Scientist 78 (6): 550–58.

  1. Try out \title{Some Paper} and \author{Some Person} in the preamble and \maketitle just after the \begin{document} line.↩︎